Detection of trend changes in time series using Bayesian inference 
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Change points in time series are perceived as isolated singularities where two regular trends of 
a given signal do not match. The detection of such transitions is of fundamental interest for the 
understanding of the system's internal dynamics. In practice observational noise makes it difhcult 
to detect such change points in time series. In this work we elaborate a Bayesian method to estimate 
the location of the singularities and to produce some confidence intervals. We validate the ability 
and sensitivity of our inference method by estimating change points of synthetic data sets. As an 
application we use our algorithm to analyze the annual flow volume of the Nile River at Aswan from 
1871 to 1970, where we confirm a well-established significant transition point within the time series. 

PACS numbers: 02.50.Tt, 02.50.Cw, 05.45. Tp, 92.70. Kb 



I. INTRODUCTION 

The estimation of change points challenges analysis 
methods and modeling concepts. Commonly change 
points are considered as isolated singularities in a reg- 
ular background indicating the transition between two 
regimes governed by different internal dynamics. In time 
series analysts focus on change points in observed data 
to reveal dynamical properties of the system under study 
and to infer on possible correlations between subsystems. 
Detecting trend changes within various data sets is under 
intensive investigation in numerous research disciplines, 
such as palaeo-climatology yj (2j , ecology [31 S] , bioinfor- 
matics [3 [6] and economics [H [8] . 

In general, the detection of transition points is adressed 
via (i) regression or (ii) spectral analysis methods 
[TUj , (iii) Bayesian approaches [11] [T^] or (iv) recurrence 
network techniques [13l [Mj . 

In this work we formulate transition points not only in 
terms of the underlying regular dynamics, but also as 
a transition in the heteroscedastic noise level. We use 
Bayesian inference to produce estimates for all relevant 
parameters. Our signal model is described by a regular 
mean undergoing a sudden change and a heteroscedastic 
fluctuation which undergoes as well a sharp transition 
at the same time point. Thus, in its simplest form, the 
observed signal y has a linear trend undergoing a break 
point ^ at a time point ti — 9. The posterior density 
p{9\y) of the change point given the signal enables us 
to derive the point estimate 9 as the most likely break 
point and its confidence bounds. By applying a sliding 
window, we formally localize the posterior density and 
the modelling of the subsignals as a linear trend is valid 
in first order. Consequently we investigate time series 
globally and locally for a generalized break point in the 
signal's statisitical properties. 

In comparison to established methods (e.g. (ii) multi- 
scale spectral analysis [TUl) our technique is not restricted 
on a uniform time grid (e.g. as required for filtering meth- 
ods). The majority of existing methods require addi- 
tional approaches to interpret the confidence of the out- 



come (e.g. (i) bootstrapping, (ii) test statistics, (iv) 
introducing measures). Whereas our technique provides 
the confidence intervals of the estimates as a byprod- 
uct in a natural way. This, for us is actually the most 
convincing argument to approach the detection task via 
Bayesian inference since besides the parameter estima- 
tion on its own, we obtain a degree of belief about our 
assumed model and about the uncertainties in the param- 
eters [TSHI3- Existing techniques addressing Bayesian 
inference (iii) approach on the one hand the plain local- 
ization task of the singularity by treating the remaining 
model's parameter as hidden pElll9) . On the other hand 
hierarchical Bayesian models are used [TT] mainly based 
on Monte-Carlo-expectation-maximization (MEMC) al- 
gorithms for the estimation process [51 [O] • 
In contrast, we intend to achieve an insight in the pa- 
rameter structure of the time series. We intend to detect 
multiple change points without enlarging the model's di- 
mensionality, since this increases considerably the com- 
putational time. By addressing the general framework of 
linear mixed models (LMM) [2^ we are able to factorize 
the joint posterior density into a family of parametrized 
Gaussians. This mirrors the separation of the linear from 
the non-linear parts and it simplifies considerably the ex- 
plicit computation of the marginal distributions. Our 
technique will be applied to a hydrological time scries of 
the river Nile, which exhibits a well known change point. 



II. DEFINITION OF THE MODEL 

In our modeling approach we consider two aspects of 
change points in a time series. On the one hand, a change 
point is commonly associated with a sudden change of lo- 
cal trend in the data. This indicates a transition point 
between two regimes governed by two different internal 
dynamics. On the other hand we assume that the system- 
atic evolution of the local variability of the data around 
its average value undergoes a sudden transition at the 
change point. As we will show, both aspects can be com- 
bined into a linear mixed model with hyperparameters. 
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Our formulation allows the separation of the Gaussian 
from the intrinsic non- linear parts of the estimation prob- 
lem, which besides clarifying the structure of the model, 
speeds up computations considerably. 



A. Formulation of the linear mixed model 

The simplest type of signal undergoing a change point 
at time 9 can be expressed as 

y{t) - /3o + - i|- + I32\e- t\+ + m ■ (1) 

Here we use the elementary Hockey sticks of first order 
defined through 




and 
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Figure 1: Realization of a synthetic time series of Uobs = 
100 data points generated by Equ.Q whereas the mean is 
parametrized by Fgl3 = 5-l-0.22<* +0.08-C+ and the deviation 
is modeled as cr^f^e,^ = [1.6(1 + 0.2 ■ C- +0.1 ■ C,l)Y ■ 
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Natural data series can in general not be modeled by such 
a simple behavior as given by these functions. Therefore 
we add some random fluctuations ^ around the mean 
behavior. These random fluctuations can be due to mea- 
surement noise as well as to some intrinsic variability, 
which is not captured by the low dimensional mean dy- 
namics on both sides of the change point 6. For this 
fluctuating part of the signal we suppose that its am- 
plitude is essentially constant around the change point. 
The intrinsic variability however may, like the mean be- 
havior of the system itself, undergo a sudden change in 
its evolution of amplitude. Hence we consider stochastic 
fluctuations ^(t) whose amplitudes undergo a transition 
themselves according to 



STD(^(<)) = cr(l + si|t-i 



S2\t - 



(4) 



The scale factor a could be the level of the measurement 
noise or some background level of the intrinsic fluctua- 
tions, whereas the constants Si^2 describe the systematic 
evolution of the models intrinsic variability prior and af- 
ter the change point measured in units of a. Although 
clearly the fluctuating part may contain coherent parts, 
we assume that throughout this work, that the fluctu- 
ations are Gaussian random variables, wich at different 
time points are uncorrelated 



E(C(i)e(i'))=0, t^t'. 



(5) 



This clearly is an approximation and its validity can be 
questioned in concrete applications. However this as- 
sumption allows us to implement highly efficient algo- 
rithms for the estimation of the involved parameters. 
From now on we will call this fluctuating part simply 
"noise" . A realization of such a time series is presented 



in FigjT] Given a data set of n time points t^, i = 1, . . . , n, 
the observation vector y — [s{ti)Y S M" can be written 
as follows 



(6) 



Here the fixed effect vector /3 = {Pq, Pi, ^2)'^ € K"^ corre- 
sponds to the coefficients of the linear combination of the 
Hockey sticks modeling the mean behavior. The system 
matrix of the fixed effects, F S K"^'^, is then give n by 
the sampling of the Hockey sticks C,± defined in Equ. (23) 
at the observation points 



F„ 



(7) 

The noise ^ G M" is a Gaussian random vector with zero 
mean and covariance matrix e M"^", 



I - (0, CT^f]) 



(8) 



The covariance itself is structured noise, which is 
parametrized by the two slope parameters s — (§1,52) 
and the change point 6 itself as 



1 2 



• 5,. 



(9) 



In conclusion, the probability density of the observations 
for fixed parameters (i.e. fixed effects, change point, slope 
parameters) can be written as 



y r-. M (F$,a^n 



(10) 



3 



The Likelihood function of the parameters given the data 
can then be written as 



C{(3,a,s,e\y) = 



1 



(27rcr2)f 



(11) 

Note that the functional dependency of /3 is a Gaussian 
density. Clearly in the exponential /3 is of a quadratic 
form and since S = F^rt~^F is positive definite we may 
write 



1 



(27r(j2 



(12) 



where the mode of the Gaussian in (3 is the best linear 
unbiased predictor of the fixed effects (BLUP) [21] 



(3* ^ SiTginm{y - F(3fn-\y - F(3) 

= {F^n~^F)-^F^n-^y 



(13) 



and the residuum TZ measured in the Mahalanobis dis- 
tance [22\ ■ induced by the covariance matrix fi, is 

7^2 = mm{y - Fl3 fn-^{y - Ff3) 



{y - Ff3*fn-\y - F(3* 



(14) 



In addition, the profiled Likelihood function 
C{f3* ,a, s,9\y) enables us to derive the profiled 
Likelihood estimator of the scale parameter a 



7^2 



1 ' 



(15) 



which is auxiliary for the computation of the maximum 
of the Likelihood function. 



B. Bayesian inversion 

In the light of the Bayesian theorem, we can compute 
the posterior distribution p{f3,a,0,s\y) of the modeling 
parameters given the data y from the Likelihood func- 
tion Eq. (Ill by specifying the prior distribution of the 



parameters p{f3,a,9, s), which encodes our belief about 
the parameters prior to any observation. Since we assume 
a priori no correlations between the parameters, the joint 
prior distribution can be factorized into the independent 
parts 



(16) 



In general, we do not have any a priori knowledge about 
these hyperparameters and thus we shall use flat and 
uninformative priors [231 HI] 

p{9)^l, pis)^l, pif3)^l, (17) 

For the scale parameter a we assume a Jeffrey's prior [25] 

1 



p{a) 



(18) 



These statisitical assumptions enable us to compute the 
posterior density of the system's parameters given the 
data y as 



p{(3,a,e,s\y)^C-C{f3,a,0,s\y) 



1 



(19) 



The normalization constant C ensures that the right 
hand side actually defines a normalized probability den- 
sity. From this expression, various marginal posterior 
distributions may be obtained by integrating over the 
parameters that shall not be considered. We are mostly 
interested in the posterior distribution of the possible 
change point locations 9. To produce the posterior dis- 
tribution of this quantity, we have to marginalize out all 
other variables. It turns out that all but the integral over 
the noise slopes s may be carried out explicitely. Thanks 
to the Gaussian nature of the /3 dependency we obtain 



p{a,e,s\y) 



and 



p(/3,^,s|y) 



[{y-F(3rn-'{y-F(3)]- 



Further marginalization may be performed to yield 
Pi&,s\y) = J dadf3 p{(3,a,e,s\y) 

■Jl-{n-2) 



c ■ 



^\n\\FTn~^F\ 



(20) 
(21) 

(22) 
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Again C is a constant, that ensures the normalization of 
the right hand side to a probability density. Finally the 
posterior marginal distribution of 6 can be computed by 
numeric evaluation of the following integral 



v{e\y) 



ds p{6,s\y) 



(24) 



In the same way the numeric 9 integral may be per- 
formed to elaborate the posterior information about the 
involved slope parameters s of the heteroscedastic behav- 
ior around the change point 



p{s\y) = J d9p[9,8\y) 



III. VALIDATION THE METHOD 



(25) 



In order to validate the method's performance in an 
idealized setting we use synthetic time series to discuss its 
ability to estimate the model's parameters and to elab- 
orate the sensitivity of the estimates to data loss. We 
generate the time series via the LMM Equ.Q and infer 
on the change point by computing the global marginal 
posterior density Equ.(24), i.e. over the interval of all 
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Figure 2: Normalized marginal posterior densities for the time series in Fig. [T] The maxima indicate the most probable 
estimates of the a) change point 9 = 40.5, b) fixed effect offset /3o = 4.40 and c) scale parameter a = f .50. The dashed lines 
represent the true parameter values of the underlying model. 



possible change point values 9. The location of the max- 
imum of the marginal posterior density [j'(^|y)]niax "^^^ 
be used as an estimator for the most probable location of 
a singularity 0. In case, the data contains more than one 
change point, the posterior distribution will exhibit mul- 
tiple local maxima. This could therefore be used as an 
indicator for the existence of secondary change points in 
the time series. Although a more reasonable way would 
be to consider models with multiple change points this 
approach becomes quickly uncomputable due to explod- 
ing dimensionality. Thus we propose a local kernel based 
method to be able to apply our single change point model 
locally to multi change point data series. 



A. Estimation of a single change point 

To validate our technique, we apply it to the generated 
time series of Figjl] containing a single change point at 
6 = 40. We compute all relevant two and one dimen- 
sional marginal distributions of the model's parameters 
using the formulas of the previous section. The marginal 
distributions provide Bayesian estimates for the change 
point 9, mean behavior /3, scale parameter a and het- 
eroscedastic behavior s of the data as the maxima of the 
one and two dimensional marginal distributions shown in 

First note that due to the random nature of the observa- 
tions, the posterior density too depends randomly on the 
actual series of observations. It is therefore not surpriz- 
ing, that the locations of the maxima of the posterior 
does not exactly agree with the true parameter values. 
However, they are within a certain quantile of the pos- 
terior distribution. We automatically obtain confidence 
intervals or regions by considering those level intervals or 
contour-lines, that enclose a fixed percentage of the to- 
tal probability. This yields a natural way of uncertainty 
quantification. 

The estimated change point = 40.5 differs only lit- 
tle from the real value 9 = 40.0 within a relatively 
narrow and symmetric confidence interval [35.7, 45.9] 



(Figj2^). Consequently we achieve to restrict the loca- 
tion of a probable singularity to a range < 9% of the 
time grid. The estimates of the mean behavior are ob- 
tained from FigjijD, (sji as /3 = (4.40, 0.206, 0.096). 
The Bayesian estimates reproduce the real underlying 
mean model /3 = (5.0, 0.22, 0.08) convincingly. The 
two dimensional contour plot of the marginal density 
piPii P2\y) of the mean slopes indicate an approximate 
symmetric confidence area of the most probable slope 
combinations (/3i,/32) (red area in Fig [3^). The one 
dimensional projection p(/3i|y) reveals a broader confi- 
dence interval for the estimation of /3i compared to /32. 
The scale parameter can be estimated as ct = 1.50 from 
Figj2j: within the confidence interval [0.806 , 2.84] unidi- 
rectional wider to growing cr- values and differs little from 
the true value a = 1.60. The two dimensional contour 
plot of the marginal density of the deviation slope param- 
eters p{si, S2\y) indicate a slight asymmetric confidence 
area of the most probable slope combinations (si,S2) 
(red area in Figmj). The one dimensional projections 
p{si\y) and p{s2\y) display unidirectional wider confi- 
dence bounds for the estimates s"i = 0.087 to bigger and 
52 — 0.167 to smaller parameter values. 

Table I: Estimated model of the synthetic signal of Fig. [l] 
parameter estimate confidence > 95% 



e 


40.5 


[35.7,45.9] 




4.40 


[3.00,5.75] 




0.206 


[0.035,0.390] 


02 


0.096 


[-0.015,0.189 


a 


1.50 


[0.806,2.84] 


Si 


0.087 


[0.027,0.220] 


S2 


0.167 


[0.050,0.380] 



Thus for our realization, the marginal distributions of 
the heteroscedastic behavior (ct, si,S2) indicate a broad 
range of probable parameter combinations compared to 
the mean behavior (3 or the change point 9. In Tab|I] 
we summarize our point estimators and 95% confidence 
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Figure 3: Normalized two dimensional marginal posterior densities for the time series in Fig[T] The maxima indicate the 
most probable estimates of the a) fixed effect slopes (/3i,/32) = (0.206, 0.096) and b) deviation slope parameters (si,S2) = 
(0.087, 0.f67). Alongside the contour plots are presented the one dimensional projections of the posterior densities. The 
dashed lines represent the true values of the underlying model. 



intervals for them based on our analysis. 

1. Sensitivity to data loss 

In real data, analysts have to deal with sparse and 
irregularily sampled data. Our technique does not re- 
quire an uniform sampling grid of data points since from 
the beginning, it employs only the available data. As 
a validation for the sensitivity of our method to data 
loss, we randomly ignore stepwise 0% up to 87, 5% of 
the time series modeled by a sequence of Uobs = 200 ob- 
servations. The artifical time series undergo a change 
point = 80 and are further parametrized by the mean 
FgfS = 12 + 0.24 • C- + 0.02 • C+ and the deviation be- 
havior cr^rie^s = [1-2(1 + 0.18 • C- + 0.04 • C+)]^ Leav- 
ing out randomly a defined percentage of the observa- 
tions produces time series with random gaps and irregu- 
lar sampling steps. For each of these random realizations 
consisting of Uobs data points we compute the posterior 
densities pJi^j,^ (^|y) for i = 1, . . . , 50 realizations. 
The obtained averaged posterior densities {p{0\y))^ in 
the plane of the sample size Hots are shown in Fig|^ in- 
dicating with their maxima the averaged most probable 
change points (^)„ j_ • Apparently the mean of the pos- 
terior densities differs from the true value, however still 
within the width of the distribution. The latter depends 
invers proportionaly on the square root of the sample size 

width [{p{e\y))„^J cx — ^ . (26) 



parameter value 9 = 80. In any case, even for small 
data sets, as small as Uobs — 25, the non-flatness of the 
posterior clearly hints towards the existence of a change 
point in the time series. The investigation of the av- 
eraged marginal posterior densities in the plane of the 
remaining parameters reveals a broadening of the poste- 
rior distributions for riobs < 200, as naturally expected 
due to information loss in the sub time series considered 
in the inference process. 

Additionally we point out the efficiency of our method 
to infer on the explicit location of a singularity 01^^^^ for 
every single time series of the previous setting. In FigjS] 
are presented the histograms of the global point estima- 
tors dn^^^ for every single realization i = 1, . . . , 50. We 

observe that the particular global estimators are rel- 
atively robust to data loss and enable us to infer con- 
vincingly on the location of the singularity. Even consid- 
ering only 50% of the full time series, i.e. Uobs = 100, 
produces global estimates that lie in the narrow interval 
[76.0 , 83.5], representing < 4% of the full time grid. 
However, for such a data-poor situation, local additional, 
less dominant maxima are likely to appear due to ran- 
dom fluctuations in the posterior, and more sofisticated 
techniqes are needed to assess the existence of single or 
multiple change points. One approach to clearify mul- 
timodial posterior densities is the computation of local 
posterior densities within a sliding window as presented 
in the following. 



At large numbers of sampling points Uots the posterior 
converges towards a delta distribution located at the true 
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- - real change point 9=80 




Figure 4: The global maxima of the averaged posterior densi- 
ties {p{Q\y))^ ^ converge for increasing number of data points 
Uoba towards a delta distribution located at the true change 
point value ^ = 80. 




estimated change point estimated change point 9 



Figure 5: Histograms of the global change point estima- 
tors ^Qijj for i — 1,...,50 realizations and with respect to 
n-obs = 50, 100, 150, 200 data points from the setting of FigfS) 
Even for riots ~ 100 nearly all global estimates OIoq lie in the 
interval [76.0, 83.5], respectively < 4% of the full time grid. 



2. Local posterior density 

Long data sets are likely to contain more than one 
change point. So using our model globally may not be 
justified. However, locally our model assumption may 
still be valid. For this reason, we propose the following 
kernel based local posterior method. In addition this 
method allows us to treat very long data sets numerically 
more efficient since the computation scales with the the 
third power of the employed data points. Around each 
time point t we choose a data window It = [t — ,t + 
of length T. Inside this window, we take as prior 
distribution for the change point location p{9) a flat prior 
inside some subinterval of length a: 



position in form of 



Pie) 



for t - ^<9<t 



else 



< a < T. 



(27) 

We then compute the local posterior Pt{S\y\i^) around 
t based on the subseries in the data window y\i^. This 
yields a posterior distribution of a possible change point 
within each window under the assumption that there is 
actually a singularity within the window. In order to 
compare different window locations, we need to quantify 
the credibility that there is a change point. Therefore 
we compute the maximum of the Likelihood within each 
window 



f{t) — max 

6l6[t-i,t+i],si,S2eI 



^f3*,^;y\ij, 



(28) 



where a and /3* are the estimators given by Eq.(15) and 



(13 1. The global distribution of change points 6 given 



the full time series is then obtained as a weighted super- 



PiO\y)^C- / fit)pt{0\y\u)dt, 



(29) 



whereas the constant C ensures the normalization to a 
probability density. In subdata sets with no change point, 
the credibility of the model fit is very low, in conclusion 
the Likelihood maxima is of very small value and local 
estimates are judged as negligible. By construction the 
method works for multiple change points as soon as they 
are separated by at least one data window. We demon- 
strate this by applying our algorithm first on a synthetic 
single change point time series. In Figj6]is shown the sum 
of the local posterior densities weighted by the maxima 
of the local Likelihood (dashed curve) . The time series is 



one realization of the model in the previous Sect III A 1 
for a sequence of Uobs = 200 data points. Supplementary 
the applied window size riobs = 50 and the sampling grid 
of the change points ricp = 30 are presented for compar- 
ison. The sum of local posterior densities indicates the 
best model fit for windows covering the real change point 
6 ^ 80 but is non-zero even between [100, 121] suggest- 
ing that a change point model might be suitable for these 
singularity values as well. 

A second quantity that may be used to produce relative 
credibility weights for the windows is given by the Bayes 
factor [35]. Besides the goodness of fit, the complexity 
of the assumed model has to be taken into account to 
assess the most capable model describing the data and 
thus performing the estimation. Thus we test the hy- 
pothesis of no change point, respectively a linear model 
■Miin, against a change point model Mcjy in form of the 
Bayes factor 



BF[t) 



p{Mun\y\i^^ 
p{Mcp\yh) 



(30) 
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Figure 6: Normalized sum of local posterior densities weigted 
by the local Likelihood maxima (dashed) and with respect 
to the Bayes factor (solid), computed for sub time series of 
naub = 50 data points and a sampling grid of Ucp = 30 change 
points. The data is one realization of the time series defined 
in Sect jIII A l| for Uobs = 200 data points. 



The dependency of the Bayes factor on a logarithmic 
scale is shown in Fig[7]for the artifical time series of Figj6] 
The Bayes factor in this test case favors the change point 
over the linear model for all local windows, for which the 
true change point is in the support of the inner prior dis- 
tribution of 9. This local Bayes factor itself can be used as 
a diagnostic tool like the Likelihood weighted posterior, 
but we may also combine the techniques by using the BF 
as a window weighting function by setting f{t) = e"^''^^*^ 
in Eq.(29). In this form Eq.(29) corresponds therefore 



essentially to the total probability decomposition of the 
change point (cp) 



windows 



p{0\cp in window) p(cp exists in window) , 



(31) 

For comparison of both kernel approaches we present 
in Figj6] additionally the sum of local posterior densi- 
ties weighted by e~^^^'^ (solid curve). The distribution 
weighted with respect to the Bayes factor are non-zero 
in the range between [78,89] whereas the one weighted 
by the maxima of the Likelihood is non-zero in [78, 121]. 
The long tail of the latter hints to less probable change 
point locations which are automatically rejected in the 
Bayes factor weighting. Furthermore we exemplify the 
algorithm on a synthetic multi change point time series 
shown in FigjS] For clarity of presentation we plot the 
sum of posterior distributions weighted with the plain 
Bayes factor BF. We are able to infer on the true 
change point values (6*1,02,^3) = (40,100,160) via the 
estimators (^1,^2,^3) = (38.9,93.0,162.9) within their 
intervals ([33.9,47.8], [87.5, 109.1], [158.9, 167.0]) of about 
90% confidence. We obtain these intervals from a more 
detailed analysis of the partial sums of local posterior 



Figure 7: Local Bayes factor (squares) obtained for the time 
series in Figj6] The shaded area encloses values whose support 
for none of the models is substantial (based on [26]). Values 
underneath this area strongly support a change point against 
a linear model, and vice versa for values above. 



densities weighted by the factor e~^^ covering the esti- 
mated singularity locations. 

The main advantage of this localization approach even 
in a single change point context is however the enor- 
mous speedup of the computations. For instance for a 
time series of Uobs = 2000 data points we pass from a 
global computation of the marginalized posterior density 
in 3ft-4lTOm40s to a local one divided into 40 overlap- 
ping subdata sets of rigy^i, = 100 in TminAAs, respectively 



change point 
> 40,100,160 

— jBF(t) p,(»|y,„t)dt 




20 40 60 80 100 120 140 160 180 200 
change point vs. time 

Figure 8: Normalized sum of local posterior densities 
weighted by the Bayes factor, computed for sub time se- 
ries of Usub = 50 data points and a sampling grid of 
ricp = 30 change points. The parametrization of the mean 



is defined as Ff3 ^ U + 0.2 • + 0.1 • C+ 
^4.®" and the deviation is modeled as 
0.2 • C- + 0.03 ■ C+ - 0.05 • C+ " 



Cr + 0.3 



[1.6(1 



0.25 



+0.1. <r)]'- 
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a speed up of about 95%. This is achieved using Python 
2.6.5 on a Supermicro Intel(R) Core(TM)i7 CPU 920 @ 
2.68GHz with 12GB RAM. In the context of complex 
muhiple change point scenarios, as real time series mostly 
are, the localization approach of the posterior density 
p{0\y) combined with the Bayes factor realizes a pow- 
erfuU tool to scan the data seperately for single change 
points, as demonstrated in the following Sect|llIB| 



B. Annual Nile flow from 1871 to 1970 

We demonstrate our technique by applying it on a time 
series including a known significant change point. For 
this purpose we analyze the annual Nile River flow mea- 
sured at Aswan from 1871 to 1970 [27j. Several inves- 
tigation methods have verified a shift in the flow levels 
starting from the year 1899 [H [T9j |27] . Historical records 
provide the fact, that this shift is attributed partly to 
weather changes and partly to the start of construction 
work for a new dam at Aswan. Since we expect a natural 
behavior of the underlying mean we generalize our pre- 
vious model to undergo besides trend changes as well a 
sharp shift in the mean offset at the singularity 9. There- 
fore we modify the system matrix according to 



(32) 



whereas we define another type of Hockey sticks y?. and 
referring to Eq.([2|) and ([s]) not as linear but as con- 
stant. The general lormulas of the Bayesian inference 
remain the same, with these new functions. First of 
all we compute the global posterior density p(6', as 
presented in Eq.(23). By initially guessing a reasonable 



sampling grid for the change point 6 and the slope pa- 
rameters s from the data, we clearly obtain significant 
maxima in the posterior projections p{0\y) and p{s\y). 
Therefore we adjust the sampling grid to obtain finer 
posterior structures around the obvious maxima. We 
estimate the change point as 6* = 1898 within a confi- 
dence interval [1895, 1901] of over 95%. The slope pa- 
rameters of the deviation are estimated as (si,S2) — 
(0.0065, -0.0015) within the 90% confidence intervals h 
in [-0.0190,0.0450] and §2 in [-0.0065,0.0855]. 
Prior the estimators 9 and s we compute the poste- 
rior projections p(/3, 0, s|y) and p{a,9^s\y) formulated 
in Eq.(21 ) and (20). By minimizing the sampling grid of 



9 and s to its confidence intervals we are able to speed up 
the compuation and to estimate the remaining parame- 
ters (3 and a. Finally we reveal from the global posterior 
distribution the most probable model plotted in Fig[9] 
and listed in TabUll 

Additionally we investigate the time series for local singu- 
larities by computing the sum of local posterior densities 
weighted by the Bayes factor as e~^^ (displayed in Figjol) 
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Figure 9: Annual Nile flow containing a known change point 
at ^ = 1899. The sum of localized posterior densities weighted 
with respect to the Bayes factor BF indicates a change point 
a.t 9 = 1898 within its confidence interval [1896, 1900] of about 
90%. The estimated underlying model reveals the most dom- 
inant transition in the behavior of the mean. 



Table II: Estimated model of the annual Nile flux, 
parameter estimate confidence > 90% 



e 


1898 


[1895,1901] 




1.12 


[1.01,1.22] 




-0.0013 


[-0.0082,0.0057 


$2 


0.0006 


[-0.0011,0.0024 




0.82 


[0.76,0.90] 


a 


0.124 


[0.094,0.160] 


Sl 


0.0065 


[-0.0190,0.0450 


82 


-0.0016 


[-0.0065,0.0855 



for the window sizes risub — 50a of considered subseries. 
The change point sampling grid contains ricp — 30a in 
a resolution of = 0.5a. Since most secondary max- 
ima are < 1% we ignore them and therefore conclude 
on one global change point at = 1898 in the interval 
[1896, 1900] of about 90% confidence. Note that we inter- 
pret the splitting of the global maximum as an artefact 
from the high resolution of the numerical change point 
sampling — 0.5a. 

In conclusion, we are able to confirm previous investiga- 
tion techniques and auxiliary reveal further information 
from the parameter space of the multidimensional poste- 
rior density of the applied LMM. 



IV. CONCLUSIONS 

We introduce a general method for the detection of 
trend changes in heteroscedastic time series by describ- 
ing the observations as a linear mixed model. The change 
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point is thereby considered as an isolated singularity in 
a regular background of a signal, assuming partial linear 
mean and deviation in the first order approach. By ad- 
dressing the framework of linear mixed models we achieve 
to simplify the explicit computation of the marginal pos- 
terior distributions and thus reduce the computational 
time considerably. The formulation of the marginalized 
posterior densities of the model's parameters enables us 
to obtain inter alia the probability density of a change 
point given the data. Therefore the technique yields an 
insight in the parameter space of the underlying model, 
estimates these parameters and intrinsically provides a 
description of their confidence intervals. 
We elaborate our technique for single change point mod- 
els by infering on the relevant model parameters and dis- 
cuss the sensitivity of the singularity estimator with re- 
spect to data loss. Additionally we present a kernel based 
approach to investigate more complex time series with 
multiple change points by localizing the posterior den- 
sity and using the Bayes factor as a weighting function. 



Moreover we apply our algorithm on the annual flow vol- 
ume of the Nile River at Aswan from 1871 to 1970. We 
confirm a well-established transition in the year 1899 by 
the estimated change point at 1898 within the interval 
[1896,1900] of about 90% confidence. We specify the 
underlying model and identify the mean as the statisti- 
cal property undergoing the most significant transition. 
We conclude by emphasizing that our algorithm depicts 
a powerfuU tool to estimate the location of transitions 
in heteroscedastic time series and to infer on the under- 
lying behavior in a partial linear approach, meanwhile 
reducing the computational time. 
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